AITopics | trajectory convolution

The temporal convolution, however,comes with an implicit assumption-thefeature maps across timesteps arewellaligned sothat the features at the same locations can be aggregated.

artificial intelligence, convolution, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Trajectory Convolution for Action Recognition

Neural Information Processing SystemsNov-20-2025, 22:33:15 GMT

How to leverage the temporal dimension is a key question in video analysis. Recent works suggest an efficient approach to video feature learning, i.e., factorizing 3D convolutions into separate components respectively for spatial and temporal convolutions. The temporal convolution, however, comes with an implicit assumption - the feature maps across time steps are well aligned so that the features at the same locations can be aggregated. This assumption may be overly strong in practical applications, especially in action recognition where the motion serves as a crucial cue. In this work, we propose a new CNN architecture TrajectoryNet, which incorporates trajectory convolution, a new operation for integrating features along the temporal dimension, to replace the existing temporal convolution. This operation explicitly takes into account the changes in contents caused by deformation or motion, allowing the visual features to be aggregated along the the motion paths, trajectories. On two large-scale action recognition datasets, namely, Something-Something and Kinetics, the proposed network architecture achieves notable improvement over strong baselines.

name change, temporal convolution, trajectory convolution, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (0.61)

Add feedback

Trajectory Convolution for Action Recognition

Yue Zhao, Yuanjun Xiong, Dahua Lin

Neural Information Processing SystemsNov-20-2025, 17:58:25 GMT

How to leverage the temporal dimension is one major question in video analysis.

artificial intelligence, convolution, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reviews: Trajectory Convolution for Action Recognition

Neural Information Processing SystemsOct-7-2024, 16:45:06 GMT

UPDATE: Thank you to the authors for addressing my concerns. With the new version of Table 1, and the clarification of ResNet-18 vs BN-Inception, my concern about the experimentation has been addressed -- there does seem to be a clear improvement over classical 3D convolution. I have adjusted my score upwards, accordingly. Recently, a number of new neural network models for action recognition in video have been introduced that employ 3d (spacetime) convolutions to show significant gains on large benchmark datasets. When there is significant human or camera motion, convolutions through time at a fixed (x,y) image coordinate seem suboptimal since the person/object is almost certainly at a different position in subsequent frames.

action recognition, convolution, trajectory convolution, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.57)

Add feedback

Trajectory Convolution for Action Recognition

Zhao, Yue, Xiong, Yuanjun, Lin, Dahua

Neural Information Processing SystemsFeb-14-2020, 10:14:19 GMT

How to leverage the temporal dimension is a key question in video analysis. Recent works suggest an efficient approach to video feature learning, i.e., factorizing 3D convolutions into separate components respectively for spatial and temporal convolutions. The temporal convolution, however, comes with an implicit assumption – the feature maps across time steps are well aligned so that the features at the same locations can be aggregated. This assumption may be overly strong in practical applications, especially in action recognition where the motion serves as a crucial cue. In this work, we propose a new CNN architecture TrajectoryNet, which incorporates trajectory convolution, a new operation for integrating features along the temporal dimension, to replace the existing temporal convolution.

convolution, temporal convolution, trajectory convolution, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (0.44)

Add feedback

Trajectory Convolution for Action Recognition

Zhao, Yue, Xiong, Yuanjun, Lin, Dahua

Neural Information Processing SystemsDec-31-2018

How to leverage the temporal dimension is one major question in video analysis. Recent works [47, 36] suggest an efficient approach to video feature learning, i.e., factorizing 3D convolutions into separate components respectively for spatial and temporal convolutions. The temporal convolution, however, comes with an implicit assumption - the feature maps across time steps are well aligned so that the features at the same locations can be aggregated. This assumption can be overly strong in practical applications, especially in action recognition where the motion serves as a crucial cue. In this work, we propose a new CNN architecture TrajectoryNet, which incorporates trajectory convolution, a new operation for integrating features along the temporal dimension, to replace the existing temporal convolution. This operation explicitly takes into account the changes in contents caused by deformation or motion, allowing the visual features to be aggregated along the the motion paths, trajectories. On two large-scale action recognition datasets, Something-Something V1 and Kinetics, the proposed network architecture achieves notable improvement over strong baselines.

artificial intelligence, convolution, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Trajectory Convolution for Action Recognition

Zhao, Yue, Xiong, Yuanjun, Lin, Dahua

Neural Information Processing SystemsDec-31-2018

How to leverage the temporal dimension is one major question in video analysis. Recent works [47, 36] suggest an efficient approach to video feature learning, i.e., factorizing 3D convolutions into separate components respectively for spatial and temporal convolutions. The temporal convolution, however, comes with an implicit assumption - the feature maps across time steps are well aligned so that the features at the same locations can be aggregated. This assumption can be overly strong in practical applications, especially in action recognition where the motion serves as a crucial cue. In this work, we propose a new CNN architecture TrajectoryNet, which incorporates trajectory convolution, a new operation for integrating features along the temporal dimension, to replace the existing temporal convolution. This operation explicitly takes into account the changes in contents caused by deformation or motion, allowing the visual features to be aggregated along the the motion paths, trajectories. On two large-scale action recognition datasets, Something-Something V1 and Kinetics, the proposed network architecture achieves notable improvement over strong baselines.

artificial intelligence, convolution, machine learning, (17 more...)

Neural Information Processing Systems

Technology: